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f"*) . Abstract 



A range counting problem is specified by a set P of size \P\ = n of points in R , an integer weight x p 



associated to each point p £ P, and a range space KC2 P . Given a query range R £ 1Z, the output is 
-^( x ) = ~52 P £r x p- The average squared error of an algorithm A is t4t X^fleTC (-4(^i x ) — -R( x )) • Range 
counting for different range spaces is a central problem in Computational Geometry. 

We study (e, 5)-differentially private algorithms for range counting. Our main results are for the range 
space given by hyperplanes, that is, the halfspace counting problem. We present an (e, 5)-differentially 
private algorithm for halfspace counting in d dimensions which is 0(n 1_1 ' d ) approximate for average 
squared error. This contrasts with the Q(n) lower bound established by the classical result of Dinur 
and Nissim [12] on approximation for arbitrary subset counting queries. We also show a matching lower 

l\ • bound of Q.(n 1 ~ 1 ' d ) approximation for any (e, <5)-differentially private algorithm for halfspace counting. 

Both bounds are obtained using discrepancy theory. For the lower bound, we use a modified dis- 
crepancy measure and bound approximation of (e, 5)-differentially private algorithms for range counting 
queries in terms of this discrepancy. We also relate the modified discrepancy measure to classical com- 
binatorial discrepancy, which allows us to exploit known discrepancy lower bounds. This approach also 
yields a lower bound of Sl((logn) d_1 ) for (e, <5)-differentially private orthogonal range counting in d di- 
mensions, the first known superconstant lower bound for this problem. For the upper bound, we use an 
approach inspired by partial coloring methods for proving discrepancy upper bounds, and obtain (e, 8)- 
differentially private algorithms for range counting with polynomially bounded shatter function range 

\y-\ \ spaces. 

1 Introduction 

A range counting problem is specified by a set P of size \P\ = n, and a range space 1Z C 2 P . Given a 
query range R € 72., the output is \{p £ P fl R}\. More generally, each point p £ P has an integer weight 
KA ' x p and the range returns i?(x) = J2 P £R x p- This problem is fundamental in Computational Geometry and 

a workhorse in applications, for various examples of range spaces from axis-parallel boxes (orthogonal range 
counting), to regions bounded by hyperplanes (halfspace counting) and beyond (e.g., simplices). Orthogonal 
range counting is commonly used in databases and data analysis. Halfspace counting is not only interesting 
in itself, but general algebraic range counting can be "lifted" to a higher dimension and encoded as halfspace 
counting [55] , 

We study privacy of range counting. In private range counting the set P of points as well as the range 
space 1Z are considered public information, while the point weights x v are considered private (and may 
denote, e.g. number of users at a geographic location). As the exact solution can reveal the private weights, 
we need to turn to approximate solutions. We define the average squared error of an algorithm A for range 
counting as p^r X)rsk (--4(-R; x ) — -R( x )) ■ For privacy, we adopt the well-established notion of differential 
privacy. A mechanism M = {M n } is (e, 5) -differentially private if for every n, every x, x' with ||x — x||i < 1, 
and every measurable S C R d , the map M n satisfies 

Pr[M„(x) eS]<e E Pr[Af„(x') £ S] + 6. (1) 
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Surprisingly, very little is known about private range counting. Applying methods of differential privacy 
from first principles (Laplace noise and the basic composition theorem of differential privacy) will add large 
— variance fl(n 2 ) in the case of halfspace counting in the plane — noise to each output. More generally, let 
A be an incidence matrix for a range space 1Z (i.e. a matrix whose rows are the indicator vectors of all ranges 
R 6 TZ) and let x be the weights. The problem of computing Ax is the range counting problem. The average 
squared error of an approximate algorithm A is pir | -4(x) — Ax| \ . In general, we can consider this problem for 
any A£{0,l} mx ", not necessarily ones that correspond to natural ranges from some constant dimensional 
geometric space. This is the 'predicate counting problem, well-studied in differential privacy. Then it is 
known that no mechanism that has average squared error o(n) can be (e, ^-differentially private |12| 115]. 
However, the lower bounds are obtained using random A's that will not correspond to specific range spaces 
of interest. No super-constant lower bounds are known against (e, <5)-differential privacy for natural problems 
like halfspace or orthogonal range counting in constant dimensional space, lj 

Our results are for (e, <5)-differentially private range counting, and use the combinatorial structure of A's 
for range spaces. Our main application is halfspace counting, but our approach is general and yields other 
results too. 

• (Halfspace counting upper bound) The (primal) shatter function oilZ is defined as tt-r.(s) = max^ /p\ |7?-|x| 

(i.e. the number of distinct sets in the restriction TZ\x)- The shatter function of 1Z defined by halfspaces in 
d-dimensions is bounded as n-jz(s) = 0(s d ). 

We show that there is an (e, <5)-differentially private range counting mechanism that achieves 0(n 1-1 / d ) 
average squared error for range spaces with shatter function bounded by 0(s d ), and therefore for d- 
dimensional halfspace range counting. 

Our upper bound shows that previous lower bounds |12[ 115] for general A's indeed do not apply to 
halfspace range counting. Our algorithm runs in time polynomial in n and m. Previous work on this 
problem is incomparable. Work by Blum, Ligett and Roth [4. gave a non-constructive squared error upper 
bound of 0(d 2 n 4 ' 3 ) for range spaces with VC-dimension d and a matching constructive bound for halfspace 
range counting for (e, 0)-differential privacy with a slightly different objective. Since the shatter function of a 
range space with VC-dimension d is bounded by 0(s d ), our result also implies a constructive approximation 
upper bound of 0(n 1_1 / d ) for VC-dimension d range spaces. 

Our approach relies on prior work [25 to decompose the range space into a logarithmic number of 
range spaces, some of them consisting only of small ranges, and some containing a small number of distinct 
ranges. We exploit this trade-off between maximum range size and number of distinct ranges by combining 
randomized response and Laplacian noise based differentially private mechanisms, but this balancing still 
leaves us with large noise in some cases. Nevertheless, we can bound the average privacy loss over the points 
p e P. Our main idea is to use this approach to preserve privacy for most points peP; the shatter function 
bound does not increase for restrictions of P and 1Z and we can recurse on the remaining points of P. This 
argument is inspired by partial coloring methods used in discrepancy theory. ■ 

• (Range counting lower bound) For halfspace counting in d dimensions, we show that any mechanism that has 
average squared error within o(n 1 ~ 1 ' d ) is not (e, ^-differentially private for any constant e and S. We prove 
this lower bound using a notion of discrepancy where, in contrast to the standard notion where {+1,-1} 
colorings are considered, we allow {0, +1, —1} colorings but subject to some budget constraints on {+1, — 1}. 
The budget constraints allows us to relate this notion of discrepancy to the classical one. Once the approach 
via the correct notion of discrepancy is developed, the mechanics are simple. Lower bounds will follow from 
combinatorial analysis of the discrepancy of range spaces. For orthogonal range counting, our approach 
immediately gives a lower bound of (logn)^" ' 1 ' on the average squared error of any (e,5) differentially 
private mechanism. The best upper bound in this setting is the work of Chan, Shi, and Song [5] who give 
an algorithm with average squared error 0((\ogn) 2d ). No previous super-constant lower bounds are known 
for this problem even for large constant d. We note that proving a tight lower bound on the combinatorial 
discrepancy of axis-aligned boxes in d dimensions is a major open problem in discrepancy theory, and any 
improvement to the current discrepancy lower bound will yield a corresponding improvement in lower bounds 
for privacy. ■ 



1 Constant lower bounds follow from the work of Roth [27] as well as from reductions from lower bounds for conjunction 
queries. 



In Section[2]we review related prior work. In Section[31 we define concepts we need, including differential 
privacy and suitable notions of discrepancy. In Section[4l we present our lower bounds, and in Section[5l the 
upper bounds. We describe extensions and alternative algorithmic solutions in Section [51 

2 Prior Work 

There is a rich and growing literature on solving counting problems while satisfying strong privacy guarantees. 
We will survey the prior work that is most relevant to our results. 

In a seminal paper, Dinur and Nissim |12j initiated the study of the limits of output perturbation in 
answering arbitrary counting queries privately. They showed that if an algorithm A satisfies ||^4(x) — Ax]^ = 
o(n) for a random 0-1 matrix A, then an adversary can reconstruct x almost exactly, implying that the 
algorithm is not (e, ^-differentially private for any constant £,<5|3 There is relatively little prior work on 
negative results for (e, <5)-differential privacy for natural restrictions of A. An exception is the work on 
lower bounding the noise necessary to privately answer conjunction queries [241 111 ) . Conjunction queries 
on a database with d attributes can be reduced to answering orthogonal range counting or halfspace range 
counting queries in d dimensions. When d is constant, the lower bounds on conjunction queries imply a 
lower bound of C d (for an absolute constant C > 1) on the average squared error neccessary to answer d- 
dimensional halfspace or orthogonal queries privately (here and in the remainder of this section we suppress 
dependence on e, S, and the probability of failure). In other related work, Roth [27] showed that linear 
queries with fat shattering dimension D require squared noise il(D 2 ) to preserve privacy. The fat shattering 
dimension reduces to the VC-dimension for counting queries, and has value d + 1 for the range space of 
halfspaccs in d dimensions. No super-constant lower bounds were previously known for (e, <5)-diffcrential 
privacy for the halfspace range counting or orthogonal range counting problems in constant dimensional 
space. 

The study of private range counting for restricted range spaces was initiated with the work of Blum, 
Ligett, and Roth [4], who, using an argument based on epsilon nets, showed that queries of VC dimension d 
can be answered with worst-case squared noise 0(d 2 n 4 ' 3 ). Their algorithm is not computationally efficient, 
but they gave efficient algorithms with comparable guarantees for the interval range counting and halfspace 
range counting problems. Although their error bound is inferior to ours (when the size of the database 
is comparable to the universe size), the models are not directly comparable. While we consider a finite 
universe, they consider a continuous space, but give relaxed utility guarantes, namely that each query 
answer is accurate for a halfspace close to the query halfspace. Additionally, their algorithms satisfy the 
stronger notion of (e, 0)-differential privacy and accomodate the regime where ||x||i is public and bounded 
by n and P is much larger. 

For interval queries, the work of Blum, Ligett, and Roth was subsequently improved by Xiao, Wang, and 
Gchrke |32j (in the regime where database size and universe size are comparable) , who gave a polylogarithmic 
noise upper bound via the wavelet transform. A related algorithm that achieves an average squared error 
upper bound of 0((logn) 3d ) for <i-dimensional orthogonal range counting was given by Chan, Shi, and 
Song [5|. We note that if we relax the privacy guarantee of Chan, Shi, and Song to (e, <5)-diffcrential privacy, 
their algorithm can be analyzed to provide average squared error 0(log n). 

Much subsequent work has focused on answering m arbitrary queries efficiently with squared error linear 
in n and polylogarithmic in m [1 6 ,251 HH1 HD US]- A related line of work investigates the problem of answering 
conjunction queries with optimal error |18l [2j. 

Prior work for (e, 0)-differential privacy. Stronger lower bounds can be shown when (5 = 0, and there 
are known separations between the cases 5 — and 5 > 0, even when 5 is superpolynomially small [11] , 
Hardt and Tulwar [22j gave a lower bound for linear queries based on geometric properties of the query 
matrix A. De [11] simplified and extended their lower bound results. Blum, Ligett, and Roth [4] showed 
that no (e, 0)-differentially private mechanism can answer interval queries with any nontrivial noise when 
the universe is continuous. 

Discrepancy theory. For background in discrepancy theory we refer the reader to the books of Chazelle [7] 
and Matousek [26 . Chazelle provides an overview of the applications of discrepancy theory to computer 



2 Our methods based on discrepancy allow us to re-prove the lower bound of Dinur and Nissim, as well as the version of 
Dwork and Yekhanin [17] that uses an explicit A. 



science, while Matousek gives a survey of discrepancy theory results for geometric range spaces. 

Geometric range counting. Geometric range counting and the closely related problems of range sums 
and range searching have a rich history in computational geometry. We refer the reader to the survey of 
Agarwal and Erickson [I] for background. 

3 Preliminaries 

We typeset vectors and matrices as x, A and their elements as Xj , Aij . We denote the i-th row of A as 
Aj* and the j'-th column as A*j. Given a matrix A, the function col(A) equals the number of columns of 
A. For a matrix A with n columns, and a set S C [n] we use A|s to denote the submatrix of A consisting 
of the columns corresponding to elements of S (with duplicated rows removed) . Similarly, for a range space 
1Z with incidence matrix A, the range space 1Z\s is the one corresponding to the incidence matrix A|g. We 
denote the i-th standard basis vector (0, . . . , 0, 1, 0, . . . , 0) T (where 1 is in the i-th. coordinate) as ei. For a 
set P we denote the collection of subsets of P of size s as ( ) . 

3.1 Range Counting 

We will use the definitions for range counting, average squared error, orthogonal and hyperspace range 
counting, as well as the linear algebraic notation introduced in the Introduction. We also consider worst- 
case squared error, which for an algorithm A and a range space with incidence matrix A is ||-A(x) — Ax]^ > 
— ||-4(x) — Axil!- We give all our lower bounds in average squared error and state our upper bounds in terms 
of both average and worst-case squared error. 

The VC- dimension of a range space 1Z is defined as the size of the largest set X C P such that 1Z\x = 2 . 
The (primal) shatter function oiTZ is defined as ttti(s) = max^./p-i |7^|x| (i-e. the number of distinct sets 

inK\ x ). 

Fact 1 (|26j). If the VC-dimension of TZ is d, then tt-jz(s) = 0(s d ). Conversely, if tt-r(s) — s ^ 1 ' then the 
VC- dimension oflZ is constant. 

Fact 2 f[26j). The VC-dimension of the range space TZ induced on P by all half spaces in M. d is d + 1. The 
shatter function of 1Z is bounded as tt-jz — 0(s d ). 

3.2 Differential Privacy 

For any two sets IA {the universe) and Y , a mechanism M. over IA with range Y is a family of maps {Af„}, 
M n : U n — > p(Y), where p(Y) is the set of random variables that take values in Y. For the rest of this paper, 
we will focus on mechanisms over Z or over {0, 1}, with range M. m . 

Definition 1. A mechanism M — {M n } over (a subset of) Z with range Y is (e, (5)-differentially private if 
for every n, every x, x' with ||x — x'||i < 1, and every measurable S C Y. the map M n satisfies 

Pr[M„(x) e S]< e e Pr[M„(x') € S] + 5. 

For lower bounds we use the following claim, which implies that being able to decode most of the input 
from the output contradicts differential privacy. 

Lemma 1 (|llj). Let M = {M n } be a mechanism such that for some n there exists a (not necessarily 
efficient) algorithm A such that 

Vx G Z" : Pr[||.4(M„(x)) - x||i > an] < [3. 

Then there exist e — e(a,f3) and 5 — 5 (a, ft) such that the mechanism A4 is not (e, S) -differentially private. 

A basic mechanism to achieve differential privacy with <5 = is the Laplace noise mechanism, first 
proposed in [13] . Let us here and for the rest of the paper denote by Lap(s) the Laplace distribution 
centered at with scale parameter s. 



Lemma 2 ( |13j). Let f be any real-valued function which for any x, x' G Z" such that ||x — x'||i < 1 satisfies 
|/(x) — /(x')| < 1. Then the mechanism that on input x outputs /(x) +Lap(l/e) satisfies (e, 0)- differential 
privacy. 

The composition of mechanisms M 1 = {M^}, . . ., M s = {Af*} is the mechanism that on input Z™ 
outputs (M*(x), . . ., M^(x)). We need the following composition lemma first proved in [13] . 

Lemma 3 ( [13j ) . Let the mechanisms A4 1 ,..., Ai s satisfy, respectively, (ei,8i),...,(e s ,S a ) differential pri- 
vacy. The composition Ai of the mechanisms satisfies (J^ . e j, ^ ; (5 : j)- differential privacy. 

We also need a stronger result, which is a straightforward extension of the composition theorem of Dwork, 
Rothblum, and Vadhan [TJ]. To state the result we define a notion of privacy loss. Following [13], let us first 
define the maximum divergence of two random variables a and b as 

n , ,,,v , Pr[« G 5] 

D^oll&J-injxIn.——-, 

where 5 ranges over measurable subsets of the support of b. Note that a mechanism Ai = {M n } is (e, 0)- 
differentially private if and only if for every n and any x, x' : 1 1 x — x' 1 1 1 < 1 , we have Doo (M n (x) | j M n (x' ) ) < e 
andD 0o (M n (x')||M n (x))<£. 

Definition 2. Le£ M. be a composition of Ad 1 , . . . , Ai s . The privacy loss of i € [n] for the j-th output is 

l M (i,3)= max A^M&xJHM^x')), 

x,x'— x±e; 



T/ie (£ 2 ) privacy loss of i G [n] is L M {i) = JJ2je[s] l M{hJ) 2 - 

Lemma 4. Let Ai be a composition of M. , . . . ,M S and let e > max ie u Lm(i). Then, for any 5 > 0, Ai 

satisfies (y2 ln(l/<5)e, 5) -differential privacy. 

Note that for the range counting problem, the privacy loss is defined for a point p. 

3.3 Discrepancy 

Here we define a modified notion of discrepancy. In Section |4l we show that this modified notion of discrep- 
ancy is useful in carrying out Dinur-Nissm type attacks on privacy. 

Definition 3. For any A G R mx ™ ; we define 

disc DQ (A)= min IIAxIL 
1 xe{o,±i}" ' 

||x||i>a col(A) 

herdisc„ Q (A) = max disc„ q(A|s). 

SC[n] 

The standard notions of discrepancy and hereditary discrepancy correspond to the special cases disc = 
disCoo,! and herdisc = herdisCoo.i. The cases disc2,i and herdisc2,i have also been extensively studied, 
especially as means of proving lower bounds on disc and herdisc. On the other hand the case disc Pi o is 
trivially the identically function. Next, we exhibit a connection between herdisc Pi i and herdisc P!Q , for 
a G (0, 1) and any p. 

Lemma 5. Let f(s) = maxsc[n]:|S|<s disc PiQ (A|s). Then disc P! i(A) < X^o f(0- ~ ot) l n), and, therefore, 
herdisc p ,i(A) < YT=o fd 1 - a Y n ) 

Proof. We will find an assignment x G {±1}™ such that || Ax|| p < X^o f(0- ~ a Y ri )i which is sufficient to 
prove the lemma. Let x' G {0,±1}™ be such that ||Ax|| p < f(n) and ||x||i > an. Let S — {i : Xi = 0}. 
Since ||x||i > an, \S\ < (1 — a)n. We recurse to find an assignment x" G {±1} S such that ||(A|s)x"|| p < 
E^o /((! - a T\ S \) < TT=i /((I " «)*»)• Set *i = x 'i when i<^S and Xi = x'( when i(=S. D 



Lemma [5] and the observation herdisc p , Q = max™ =1 f(s) imply that for any A, 

herdisc Pi i(A) < - — — herdisc p , Q (A). 

logl/(l-a) 

However using Lemma [5] directly and the observation that a restriction of a halfspace range space (or a range 
space of axis-aligned boxes) is a range space of the same kind, we get stronger lowerbounds for herdisc PiC ,. 
Below we list several interesting results that can be derived in this way from known results in combinatorial 
discrepancy theory [7] [55] . Below we provide more specific references to the discrepancy lower bound used to 
derive each result. We provide a full proof of the first result; the remaining proofs follow analogous reasoning. 

Lemma 6 (|9J. For infinitely many n there exists a set of n points P and m half spaces H\,...,H m in 
R d (d — 0(1)) such that the following holds. Let A denote the incidence matrix of the collection of sets 
{Hj n?,j€ [m]}. Then for any a = £2(1), herdisc 2 , Q (A) = VL(m 1 / 2 n 1 / 2 - 1 / 2d ). 

Proof. Assume for contradiction that all but finately many m x n incidence matrices A of halfspaces in 
M. d have hereditary a-discrepancy herdisc2, Q (A) = o(m 1 / 2 n 1 / 2 ~ 1 / 2d ). By the results in [9], there exist 
infinitely many sets of n points P and m = (",) halfspaces H\ , . . . , H m such that the incidence matrix B of 
{HjC\P,j G [m]} has hereditary discrepancy herdisc2,i = £2(m 1 / 2 n 1 / 2 ~ 1 / 2d ). Let us fix any such set of points 
and halfspaces and the corresponding incidence matrix B. Any restriction B|s for S C P is also the incidence 
matrix of sets induced by points and halfspaces, and by assumption, herdisc(B|s) = o(m 1 ^ 2 \S\ 1 ^ 2 ~ 1 ^ 2d ). 
Plugging this bound in Lemma[S]we get herdisc 21 (B) = o(m 1 / 2 n 1 / 2 ~ 1 / 2d ), a contradiction. □ 

Lemma 7 ( |291 [3]). For infinitely many n there exists a set of n points P and m axis-parallel boxes 
P>i, . . . , B m in R (d = 0(1)) such that the following holds. Let A denote the incidence matrix of the 
collection of sets {Bj nFj'e [to]}. Then for any a = £2(1), herdisc 2 , Q (A) = £2(m 1/2 (logn) d/2 ~ 3/2 ). 

Lemma 8 ([8]). For infinitely many n there exists a set of n points P and m axis-parallel boxes B\, . . . , B m 
in M. d (d = Q(\ogn)) such that the following holds. Let A denote the incidence matrix of the collection of 
sets {Bi nP,j € [to]}. Then for any a = £2(1), herdisCoo, Q (A) = n n ^\ 

Lemma 9 ([30]). For any n and m > n there exists a matrix A £ {0, l} mxn such that herdisc 00ja (yl) = 
£2(-^/nlog2m/n). 

4 Lower Bounds for Privacy from Discrepancy 

Our main result in this section is a noise lower bound on (e, <5)-differentially private mechanisms that approx- 
imate range counting queries for a host of natural geometric range spaces. Our main conceptual contribution 
is in identifying herdisc P:Q as the key quantity in showing lower bounds against (e, ^-differential privacy via a 
Dinur-Nissim type attack, and connecting this quantity to the standard notion of combinatorial discrepancy. 

Theorem 1. For any (X,j3, there exist e(a,j3) and 5(a,f3) such that no mechanism A4 = {M n } over the 
universe {0, 1} with range M. m that for some p satisfies 

Vx G {0, 1}" : Pr[||M„(x) - Ax|| p < disc p , Q (A)/2] > 1 - /?, 

is (e, 5) -differentially private. 

We extend the lower bound to herdisc PjQ . This allows us to use the connection between herdisc P!Q , and 
standard discrepancy. 

Corollary 1. For any a, ft, there exist e(a, /3) and 6(a,/3) such that no mechanism M — {M n } over the 
universe {0, 1} with range R m that for some p satisfies 

Vx G {0, 1}" : Pr[||M„(x) - Ax|| p < herdisc P!Q (A)/2] > 1 - j3, 

is (e, 5) -differentially private. 

6 



Proof. We claim that given M n and any set S C [n], we can construct M' n that takes as input x\s, is 
(e, <5)-diffcrcntially private (with respect to x|s), and satisfies 

Vx| s : Pr[||M;(x| s ) - (A| 5 )(x| s )|| P < herdisc p , Q (A| s )/2] >l-0. 

Then we can take S such that disc PiQ (A|s) = herdisc PjQ ,(A), and the corollary follows from TheoremQ] 

We define M' n as follows: M' n {n\s) extends x|g to x by setting Xi = for all i $ S and outputs M„(x). 
It's easy to verify that M' n satisfies the claimed properties. □ 

Theorem Q] follows from Lemma Q] and the following lemma. 

Lemma 10. There exists a deterministic (not necessarily efficient) algorithm A that on input a matrix 
A e M mx " and a vector y G R m satisfying ||y — Ax|| p < disc P!Q (A)/2 for some x G {0, l} n , outputs a vector 
x' G {0, 1}" such that ||x' — x||i < an. 

Proof. Given y, A outputs an arbitrary x' G {0, 1}™ such that || Ax' — y|| p < disc p . Q (A)/2. Such a x' exists, 
since ||Ax — y|| p < disc PiQ (A)/2 by assumption. We claim that ||x — x'|| < an. For contradiction, assume 
||x — x'|| > an. Notice that x — x' G {0, ±1}. Then, by the definition of disc PiQ ,, ||A(x — x')|| p > disc PjQi (A). 
By the triangle inequality, the assumption of the lemma, and the definition of A, ||A(x — x')|| p < ||Ax — 
y|| p + 1 1 Ax' — y|j p < disc PjQ (A), and we've reached a contradiction. □ 

Corollary [TJ instantiated with p = 2, and Lemmas [SH5] imply an array of noise lower bounds for approxi- 
mating geometric range counting while satisfying (e, <5)-differential privacy. 

Theorem 2. Any mechanism M. that, for any P in M. d with \P\ = n and d = 0(1), with constant probability 
approximates the halfspace range counting problem within average squared error o(n l ~ 1 ' d ) is not (s,5)- 
differentially private for any constant e and 6. 

Theorem 3. Any mechanism M. that, for any P in M. d with \P\ = n and d = 0(1), with constant probability 
approximates the orthogonal range counting problem within average squared error o((logn) d ~ 1 ) is not (s,5)- 
ferentially private for any constant e and 6. 



Theorem 4. Any mechanism A4 that, for any P in R d with \P\ — n and d — 0(logn), with constant 
probability approximates the orthogonal range counting problem within average squared error n ^ 1 ' is not 
(e, 8) -differentially private for any constant e and d. 

We also note that that Corollary [1] instantiated with p — oo and Lemma [9] imply a lower bound on the 
worst case squared error for privately approximating m arbitrary range counting queries where m is much 
larger than n. 

Theorem 5. Any mechanism A4 that, for any range space (P,1Z) (\P\ = n, \1Z\ — m), with constant proba- 
bility approximates range counts for 1Z with worst case squared error o(n log 2m/n) is not (e, S) -differentially 
private for any constant e and 6. 

The results of Dinur and Nissim [T2 for m = 0(n) and m = 2™ are special cases of Theorem [SJ To the 
best of our knowledge, this is the first lower bound that explicitly accounts for the dependence of error on 
m for arbitrary m > n. 

5 Algorithm for Bounded Shatter Function Systems 

In this section we present an efficient (for constant d) (e, (^-differentially private range counting algorithm 
for range spaces with bounded shatter function. We prove the algorithm gives optimal average squared 
error and almost optimal worst-case squared error bounds. The algorithm is based on a novel use of a 
decomposition that was first constructed by Matousek [5S] to prove optimal discrepancy upper bounds for 
bounded shatter function range spaces. Even a careful application of known methods in differential privacy 
together with the decomposition does not provide optimal error bounds directly; we, however, prove that 
privacy can be satisfied for a constant fraction of P while achieving optimal error bounds; then we recurse 



on the remainder of P. Aside from the decomposition, this method of satisfying privacy for a fraction of the 
database is inspired by partial coloring methods in discrepancy theory. 

We will make an essential use of the following lemma, due originally to Haussler. The lemma bounds the 
size of an epsilon net in the hamming metric. 

Lemma 11 ([23 ). Let (P,1Z) be a range space with shatter function ir-iz(s) — 0{s d ). Let A be an integer 
less than \P\. Let S C TZ be a collection of ranges such that for any two ranges R\, R2 £ S, the symmetric 
ference between R\ and R2 is at least A. Then, \S\ — 0((\P\/ A) d ). 



We construct collections of ranges with large pairwise distance A for gemetrically growing values of 
A. Using the collections as finer and finer epsilon nets, we can represent each range in TZ as the union 
and set difference of smaller and smaller ranges, while Lemma [11] allows us to control the number of such 
ranges needed for each value of A. We then approximate range counts for the ranges that make up the 
decomposition; the trade-off between range size and number of distinct ranges allows us to balance the noise 
incurred by randomized response and by using composition (Lemma U). 

We first detail the construction. Our presentation follows [26] . Let (P, TZ) be a range space with shatter 
function tt-ji(s) = 0(s d ). Let k = |~log 2 ri\. For each i £ {0, . . . , k}, let Si C TZ be a maximal collection of 
ranges such that the symmetric difference between any two ranges R\, R2 £ Si is at least n2~ l . In particular, 
Sk = 7Z and So — {0}. For each R £ Si, fix a R' £ Si-i such that the symmetric difference between R and R' 
is at most n2~ t+1 (such a range exists by maximality of Sj_i). Then we set F(R) = R\R' and G(R) — R'\R, 
so that R' = (R\ F(R)) U G{R), F(R) C R, and G(R) C\ (R\ F(R)) = 0. Define a new collection of ranges 
77 = {F(R),G(R) : R £ Si}. We can start from R £ 7Z = Sk and apply the construction recursively, until 
we have = ((. . . ((R \ F&) U Gk) . ■ ■) U G-i) \ F\, where Fi,d £ %. Bactracking to reconstruct R, we get 

R = ((. . . (F t \G 2 )UF 2 ...)\ G k ) U F k . (2) 

All union operations are on disjoint sets and any set is subtracted from a set that entirely contains it. 

Each range in 77 has size at most n2 _l+1 by construction; by Lemma ITT1 \Si\ — 0{2 dl ), and, since each 
range in Si corresponds to at most two ranges in 77, we also have 77 — 0(2 dl ) . Let T l be the incidence 
matrix of 77- The following lemma follows from the decomposition ([2]): 

Lemma 12. Let (P,TZ) be a range space with \P\ = n and shatter function tttz(s) = 0(s d ). Let A be 
the incidence matrix of 1Z. Then, there exist matrices T l £ {0,l} SiXn and Q 4 £ {0,±l} mxsi such that 
A = Y2i=i Q*T\ Furthermore, we have the following properties for T l and Q 4 ; 

• each row in T l has at most n2~ 1+1 nonzero entries; 

• Si < C2 dl for some absolute constant C; 

• each row in Q* has at most 2 nonzero entries. 

For the degree of a point p £ P in the range space %, we use the notation di(p) = \{R £ % : p £ R}\. 

Intuitively, we will use randomized response on those T% consisting of only small ranges, and we will use 
the Laplace noise mechanism on those 77 consisting of few ranges. The "breaking-even point" for the analysis 
is io = (logn)/d. For i > i$ randomized response gives the guarantee we need: the largest range in 77 for 
i > io has size at most n 1 ~ 1 ' d . However, 77 can have as many as n ranges, and it seems that we cannot use 
Laplace noise with variance n l ~ 1 ' d and still preserve privacy for those i close to iq. To circumvent this issue, 
we use the fact that we can bound both the largest range and the number of ranges in each 77 simultaneously. 
The main observation is that we can add noise with optimal variance 0{n l ~ 1 ' d ) to the range counts for those 
77 where randomized response doesn't work, and bound the average privacy loss ^'Y1 iv Lm{p)- Then, we 
use averaging and Lemma [4] and argue that we can preserve privacy for most p £ P. The shatter function 
bound does not increase for restrictions of P and 1Z and we can recurse on the remaining points of P. Our 
algorithm for computing range counts over ranges with bounded shatter function is given as Algorithm [1] 
The algorithm description and the following discussion assume that TZ has shatter function tt-ji(s) = 0(s d ) 
(for d > 2) and the decomposition of Lemma [T2l has already been computed. Note that the decomposition 
can be computed in time 0(mn log n). 

We analyze the privacy guarantees of Algorithm [T] We first prove some technical claims about the 
algorithm. 



Algorithm 1 RangeCount(P, x, H, e, S) 



Let |P| = n, \Tl\ = m; 
Seti - '■■"- 

Set Si 

Set Si 



d ' 
eji-ig + l) 1 - 5 



i/2-i/2d for i < *o; 
= {i - i0 +i)i.i fori>i ; 



if n < 1 then 

Let p £ P be the only point in P. Return x p := x p + Lap(l/e) for all R G 72. s.t. p £ -R and for all other R £ TZ. 
end if 

Set X := {p : E,<, *(p)e? < 12C*e 2 } and X := P\ X; 

Recursively compute RangeCount(X, x.\ x ,TZ\ x ,e, 8); let the results be z\, . . . , z^. 
for all i < io do 

Compute r ■■= (TV)(x|x) + Lap(l/ e! ) s '; 
end for 
for all io < i < k do 

Compute x' := x + Lap(l/ei) n ; 

Compute y* == (T'|x)(x'|x); 
end for 

Compute z 2 := Eti QT; 
Output z = z 1 +z 2 . 



Lemma 13. The following hold for Algorithm^ 
1. \X\ > n/2. 



2. {y*}*^! is a (2\/6Ce-\/ln(l/6),6)-differentially private function o/x|x- 

3. {x. l }t =i +1 is a (2s, 0)- differentially private function of x. Moreover, for each S C P, {% l \s}i = i +i *s 
a (2e, 0)- differentially private function ofx.\s- 

Proof. Claim 1. follows by avaraging and the inequality 

-^^d,^<6^ (3) 



n 



Next we establish ([3]). 



n. -^ — ' -^ — ' r). * — ' 



_. i+1 e 2 (i-io + l) 3 



n ■' — ' •' — ' ?i ' — ' n 1 l / d 

p£P i<i i<io 

3=0 

The first inequality follows from Lemma Q2] The second inequality holds for d > 2. This finishes the proof 
of claim 1. 

The following privacy analysis uses the fact that the range space (P,H) is public, and, therefore, the 
decomposition given by Lemma 1121 and the set X determined by the decomposition are public as well, 
i.e. independent of x. 

Notice that each component of y l is an instance of the Laplace noise mechanism and, therefore, by 
Lemma [5] it is (£j,0)-differentially private. Also, ffi is independent of x p whenever T?_ = or p ^ X. 
Denoting by t/5 ( x ) the random variable y! when the input is x, we have that 



D 00 (y*(*W j (x±e p ))< , 

\ Si, otherwise 



0, T) v = or p £ X 

ii 



If M. is the mechanism that outputs {y l }*L 1; then, by the above discussion, Lm(p) — \/X>-Kj di{p)^i- By 



the definition of X, we have that Lm{p) < \V2Ce for any p £ X (and Lm(p) = for p $■ X). Claim 2. then 
follows by Lemma |H 

By Lemma [21 each x 1 is (e,-, 0)-differentially private. By Lemma [U the composition {x 8 }^ +1 is 

(Si=j +i £ i-> 0)-differentially private. Then claim 3. follows from 

k oo 

i=io+l J=2 

This completes the proof of the lemma. □ 



Theorem 6 (Privacy). Algorithm^ preserves ((2y6C + 2)e^/ln 1/5, 5) -differential privacy. 

Proof. We proceed by induction on n. 

Base case. When n < 1, the output of Algorithm [1] is (e, 0)-differentially private, since it is a function 
of x, which is itself (e, 0)-differentially private by the properties of the Laplace noise mechanism (Lemma[2|). 

Inductive step. Note that z 2 is a function of x\x and {y % Y/' =1 . Also note that both x|x and {y l Y° =l 
depend only on X and not on X. By simple composition (Lemma [3]), and Lemma 1131 z 2 is a ((2V6C + 
2)e-\/lnl/<5, (^-differentially private function of x|x- By Lemma[13j X < n/2, so by the inductive hypothesis 
z 1 is an ((2v / 6C + 2)e Av /ln 1/5, ^-differentially private function of x|x- Since X and X are disjoint, it follows 
that z = z 1 + z 2 is a (6(VC + 2)^/lnl/<5, <5)-diffcrentially private function of x. □ 

Next we analyze the approximation guarantee of the algorithm. The bounds in following lemma can 
derived by a straightforward calculation. 

Lemma 14. Let y l = (T*|x)(x|x)- For each j € [to] and each i < i , E[Q*„y 4 ] = Q^y 1 , and Var[Q^y*] = 
O(n 1 -^ d /(s 2 (i-i + l) 3 )). 

Similarly, for each j e [to] and each i > i , EfQ^y 1 ] = Qj^y 1 , and Var[Q*„y l ] = 0(n x ~ x l d (i — i + 
l)3/(2*-<oe2)). 

We're now ready to prove an approximation guarantee. 

Theorem 7 (Utility). The expected average sguared error of Algorithm\l\is 0(n 1_:L ' /e 2 ). With probability 
at least 1 — j3, the worst-case sguared error of Algorithm[]]is at most 0{n 1 ~ l ' d log(n//3)/e 2 ). 

Proof. Let z 2 = Yli=i Q l Y l - Note that all y l have indepedentent noise. Then, by Lemma[T4l for each j € [m], 
E[z 2 ] = Zj and Var[5 2 ] = 0(n 1_1 / d /e 2 ) The expected total squared error of Algorithm Q] is, by linearity of 
expectation V. Var[z,]. Since z 1 is independent from z 2 , we have V. Var[5,-] = ^.Varji 1 ] + V. Var[z 2 ]. 
By claim 1. in Lemma [13l the first term is the result of a recursive call on input of size at most n/2. We can 
express the expected squared error as a function E(n) recursively as Ein) = E(n/2) + 0(n l ^ 1 ' d /e 2 ) which 
is easily seen to resolve to E(n) = 0(n 1_1 / d /e 2 ). 

The worst-case guarantee can be derived by standard use of tail bounds for sums of Laplace random 
variables. □ 

6 Extensions 

Algorithms for halfspace range counting can be derived from several other methods, each of which provides 
weaker noise guarantees and/or less generality. 

The partition trees of Chan 6 imply a way to factor the incidence matrix A of a range space induced 
by d-dimensional halfspaces into matrices Q and D such that A = QD, each column in D has at most 
O(loglogn) nonzero elements, each row in Q has at most 0(n 1_1 / d ) nonzero elements, and Q and D both 
have elements bounded in absolute value by 1. Using Lemma [H we can add Laplace noise with variance 
0{\ log log n) to each element of Dx, preserving (e^lnl/S, 5) privacy. We can then bound the variance of 
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this mechanism to argue that, with constant probability, the average squared error is O^n 1 x l d log log n) 
and the worst case squared error is 0(T-n 1-1 / d lognloglogn). 

Welzl [3T], and Chazelle and Welzl [TU] gave an algorithm that, given a set of points P in M. d , computes 
a spanning path such that any hyperplane intersects the path in at most 0{n l ~ 1 * d ) components. Then the 
intersection of any halfspace with P can be represented as the union of 0{n l ~ 1 ' d ) disjoint intervals on the 
spanning path. An algorithm for privately computing interval counting queries, e.g. the algorithm from [5], 
can be used with the spanning path as input, giving average squared error 0{-^n 1 ~ 1 ' d \ogn) and worst 
case squared error 0{-^n l ~ l ' d log n). Interestingly, the spanning path approach generalizes to range spaces 
whose dual shatter function is bounded by a polynomial with exponent d. 

There is a well-known connection between combinatorial discrepancy and epsilon approximations (c.f. [26) . 
Chapter 1). Let (P, 72) be a range space such that the maximum discrepancy over all restrictions of 72 to 
a size s subset of P is f(s) (this is the same f(s) as in Section [3]). Under some reasonable assumptions on 
the range space, there exists a subset S of P of size s such that range counts on S are close to range counts 
on P to within an additive —f(s). Using this fact, and the discrepancy upper bound for range spaces with 
shatter function exponent d, we can apply the median mechanism of Roth and Roughgarden |28j with the 
new analysis in |19j to obtain a squared error upper bound that depends on n as 0(n 2d > ( 2d+1 )). This upper 
bound is suboptimal; for example, for d — 2, it yields an upper bound of n 4 ' 5 as opposed to the optimal n 1 ' 2 . 
Nevertheless, this method still gives squared error bounds that grow slower than n for range system with 
polynomial shatter function. It also extends to the case where the universe is much larger than ||x||i. Giving 
optimal or near optimal error upper bounds in this large universe regime is an interesting open problem. 

7 Concluding Remarks 

While predicate count queries (Ax) have been studied in differential privacy before, we make one of the first 
significant progress in understanding the complexity of the problem in terms of the combinatorial properties 
of A, in particular for halfspace, orthogonal and other range count queries. Our main result is tight upper 
and lower bounds on approximation of (e, S) differentially private halfspace count queries. Our approach is 
via a variation of discrepancy. The main problems we leave open are to get tight bounds for orthogonal 
counts with (e, ^-differential privacy and to extend our bounds to the large universe regime. 
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